[<<Previous Entry]
[^^Up^^]
[Next Entry>>]
[Menu]
[About The Guide]
Data Types and Objects
Perl has three data types: scalars, arrays of scalars, and
associative arrays of scalars. Normal arrays are indexed by
number, and associative arrays by string.
The interpretation of operations and values in perl some-
times depends on the requirements of the context around the
operation or value. There are three major contexts: string,
numeric and array. Certain operations return array values
in contexts wanting an array, and scalar values otherwise.
(If this is true of an operation it will be mentioned in the
documentation for that operation.) Operations which return
scalars don't care whether the context is looking for a
string or a number, but scalar variables and values are
interpreted as strings or numbers as appropriate to the con-
text. A scalar is interpreted as TRUE in the boolean sense
if it is not the null string or 0. Booleans returned by
operators are 1 for true and 0 or '' (the null string) for
false.
There are actually two varieties of null string: defined and
undefined. Undefined null strings are returned when there
is no real value for something, such as when there was an
error, or at end of file, or when you refer to an uninitial-
ized variable or element of an array. An undefined null
string may become defined the first time you access it, but
prior to that you can use the defined() operator to deter-
mine whether the value is defined or not.
References to scalar variables always begin with '$', even
when referring to a scalar that is part of an array. Thus:
$days # a simple scalar variable
$days[28] # 29th element of array @days
$days{'Feb'} # one value from an associative array
$#days # last index of array @days
but entire arrays or array slices are denoted by '@':
@days # ($days[0], $days[1],... $days[n])
@days[3,4,5] # same as @days[3..5]
@days{'a','c'} # same as ($days{'a'},$days{'c'})
and entire associative arrays are denoted by '%':
%days # (key1, val1, key2, val2 ...)
Any of these eight constructs may serve as an lvalue, that
is, may be assigned to. (It also turns out that an assign-
ment is itself an lvalue in certain contexts--see examples
under s, tr and chop.) Assignment to a scalar evaluates the
righthand side in a scalar context, while assignment to an
array or array slice evaluates the righthand side in an
array context.
You may find the length of array @days by evaluating
"$#days", as in csh. (Actually, it's not the length of the
array, it's the subscript of the last element, since there
is (ordinarily) a 0th element.) Assigning to $#days changes
the length of the array. Shortening an array by this method
does not actually destroy any values. Lengthening an array
that was previously shortened recovers the values that were
in those elements. You can also gain some measure of effi-
ciency by preextending an array that is going to get big.
(You can also extend an array by assigning to an element
that is off the end of the array. This differs from assign-
ing to $#whatever in that intervening values are set to null
rather than recovered.) You can truncate an array down to
nothing by assigning the null list () to it. The following
are exactly equivalent
@whatever = ();
$#whatever = $[ - 1;
If you evaluate an array in a scalar context, it returns the
length of the array. The following is always true:
@whatever == $#whatever - $[ + 1;
Multi-dimensional arrays are not directly supported, but see
the discussion of the $; variable later for a means of emu-
lating multiple subscripts with an associative array. You
could also write a subroutine to turn multiple subscripts
into a single subscript.
Every data type has its own namespace. You can, without
fear of conflict, use the same name for a scalar variable,
an array, an associative array, a filehandle, a subroutine
name, and/or a label. Since variable and array references
always start with '$', '@', or '%', the "reserved" words
aren't in fact reserved with respect to variable names.
(They ARE reserved with respect to labels and filehandles,
however, which don't have an initial special character.
Hint: you could say open(LOG,'logfile') rather than
open(log,'logfile'). Using uppercase filehandles also
improves readability and protects you from conflict with
future reserved words.) Case IS significant--"FOO", "Foo"
and "foo" are all different names. Names which start with a
letter may also contain digits and underscores. Names which
do not start with a letter are limited to one character,
e.g. "$%" or "$$". (Most of the one character names have a
predefined significance to perl. More later.)
You can also embed newlines directly in your strings, i.e.
they can end on a different line than they begin. This is
nice, but if you forget your trailing quote, the error will
not be reported until perl finds another line containing the
quote character, which may be much further on in the script.
Variable substitution inside strings is limited to scalar
variables, normal array values, and array slices. (In other
words, identifiers beginning with $ or @, followed by an
optional bracketed expression as a subscript.) The follow-
ing code segment prints out "The price is $100."
$Price = '$100'; # not interpreted
print "The price is $Price.\n";# interpreted
Note that you can put curly brackets around the identifier
to delimit it from following alphanumerics. Also note that
a single quoted string must be separated from a preceding
word by a space, since single quote is a valid character in
an identifier (see Packages).
Two special literals are __LINE__ and __FILE__, which
represent the current line number and filename at that point
in your program. They may only be used as separate tokens;
they will not be interpolated into strings. In addition,
the token __END__ may be used to indicate the logical end of
the script before the actual end of file. Any following
text is ignored (but may be read via the DATA filehandle).
The two control characters ^D and ^Z are synonyms for
__END__.
A word that doesn't have any other interpretation in the
grammar will be treated as if it had single quotes around
it. For this purpose, a word consists only of alphanumeric
characters and underline, and must start with an alphabetic
character. As with filehandles and labels, a bare word that
consists entirely of lowercase letters risks conflict with
future reserved words, and if you use the -w switch, Perl
will warn you about any such words.
Array values are interpolated into double-quoted strings by
joining all the elements of the array with the delimiter
specified in the $" variable, space by default. (Since in
versions of perl prior to 3.0 the @ character was not a
metacharacter in double-quoted strings, the interpolation of
@array, $array[EXPR], @array[LIST], $array{EXPR}, or
@array{LIST} only happens if array is referenced elsewhere
in the program or is predefined.) The following are
equivalent:
$temp = join($",@ARGV);
system "echo $temp";
system "echo @ARGV";
Within search patterns (which also undergo double-quotish
substitution) there is a bad ambiguity: Is /$foo[bar]/ to
be interpreted as /${foo}[bar]/ (where [bar] is a character
class for the regular expression) or as /${foo[bar]}/ (where
[bar] is the subscript to array @foo)? If @foo doesn't oth-
erwise exist, then it's obviously a character class. If
@foo exists, perl takes a good guess about [bar], and is
almost always right. If it does guess wrong, or if you're
just plain paranoid, you can force the correct interpreta-
tion with curly brackets as above.
A line-oriented form of quoting is based on the shell here-
is syntax. Following a << you specify a string to terminate
the quoted material, and all lines following the current
line down to the terminating string are the value of the
item. The terminating string may be either an identifier (a
word), or some quoted text. If quoted, the type of quotes
you use determines the treatment of the text, just as in
regular quoting. An unquoted identifier works like double
quotes. There must be no space between the << and the iden-
tifier. (If you put a space it will be treated as a null
identifier, which is valid, and matches the first blank
line--see Merry Christmas example below.) The terminating
string must appear by itself (unquoted and with no surround-
ing whitespace) on the terminating line.
print <<EOF; # same as above
The price is $Price.
EOF
print <<"EOF"; # same as above
The price is $Price.
EOF
print << x 10; # null identifier is delimiter
Merry Christmas!
print <<`EOC`; # execute commands
echo hi there
echo lo there
EOC
print <<foo, <<bar; # you can stack them
I said foo.
foo
I said bar.
bar
Array literals are denoted by separating individual values
by commas, and enclosing the list in parentheses:
(LIST)
In a context not requiring an array value, the value of the
array literal is the value of the final element, as in the C
comma operator. For example,
@foo = ('cc', '-E', $bar);
assigns the entire array value to array foo, but
$foo = ('cc', '-E', $bar);
assigns the value of variable bar to variable foo. Note
that the value of an actual array in a scalar context is the
length of the array; the following assigns to $foo the value
3:
@foo = ('cc', '-E', $bar);
$foo = @foo; # $foo gets 3
You may have an optional comma before the closing
parenthesis of an array literal, so that you can say:
@foo = (
1,
2,
3,
);
When a LIST is evaluated, each element of the list is
evaluated in an array context, and the resulting array value
is interpolated into LIST just as if each individual element
were a member of LIST. Thus arrays lose their identity in a
LIST--the list
(@foo,@bar,&SomeSub)
contains all the elements of @foo followed by all the ele-
ments of @bar, followed by all the elements returned by the
subroutine named SomeSub.
A list value may also be subscripted like a normal array.
Examples:
$time = (stat($file))[8]; # stat returns array value
$digit = ('a','b','c','d','e','f')[$digit-10];
return (pop(@foo),pop(@foo))[0];
Array lists may be assigned to if and only if each element
of the list is an lvalue:
($a, $b, $c) = (1, 2, 3);
($map{'red'}, $map{'blue'}, $map{'green'}) = (0x00f, 0x0f0, 0xf00);
The final element may be an array or an associative array:
($a, $b, @rest) = split;
local($a, $b, %rest) = @_;
You can actually put an array anywhere in the list, but the
first array in the list will soak up all the values, and
anything after it will get a null value. This may be useful
in a local().
An associative array literal contains pairs of values to be
interpreted as a key and a value:
# same as map assignment above
%map = ('red',0x00f,'blue',0x0f0,'green',0xf00);
Array assignment in a scalar context returns the number of
elements produced by the expression on the right side of the
assignment:
$x = (($foo,$bar) = (3,2,1)); # set $x to 3, not 2
There are several other pseudo-literals that you should know
about. If a string is enclosed by backticks (grave
accents), it first undergoes variable substitution just like
a double quoted string. It is then interpreted as a com-
mand, and the output of that command is the value of the
pseudo-literal, like in a shell. In a scalar context, a
single string consisting of all the output is returned. In
an array context, an array of values is returned, one for
each line of output. (You can set $/ to use a different
line terminator.) The command is executed each time the
pseudo-literal is evaluated. The status value of the com-
mand is returned in $? (see Predefined Names for the
interpretation of $?). Unlike in csh, no translation is
done on the return data--newlines remain newlines. Unlike
in any of the shells, single quotes do not hide variable
names in the command from interpretation. To pass a $
through to the shell you need to hide it with a backslash.
Evaluating a filehandle in angle brackets yields the next
line from that file (newline included, so it's never false
until EOF, at which time an undefined value is returned).
Ordinarily you must assign that value to a variable, but
there is one situation where an automatic assignment hap-
pens. If (and only if) the input symbol is the only thing
inside the conditional of a while loop, the value is
automatically assigned to the variable "$_". (This may seem
like an odd thing to you, but you'll use the construct in
almost every perl script you write.) Anyway, the following
lines are equivalent to each other:
while ($_ = <STDIN>) { print; }
while (<STDIN>) { print; }
for (;<STDIN>;) { print; }
print while $_ = <STDIN>;
print while <STDIN>;
The filehandles STDIN, STDOUT and STDERR are predefined.
(The filehandles stdin, stdout and stderr will also work
except in packages, where they would be interpreted as local
identifiers rather than global.) Additional filehandles may
be created with the open function.
If a <FILEHANDLE> is used in a context that is looking for
an array, an array consisting of all the input lines is
returned, one line per array element. It's easy to make a
LARGE data space this way, so use with care.
The null filehandle <> is special and can be used to emulate
the behavior of sed and awk. Input from <> comes either
from standard input, or from each file listed on the command
line. Here's how it works: the first time <> is evaluated,
the ARGV array is checked, and if it is null, $ARGV[0] is
set to '-', which when opened gives you standard input. The
ARGV array is then processed as a list of filenames. The
loop
while (<>) {
... # code for each line
}
is equivalent to
unshift(@ARGV, '-') if $#ARGV < $[;
while ($ARGV = shift) {
open(ARGV, $ARGV);
while (<ARGV>) {
... # code for each line
}
}
except that it isn't as cumbersome to say. It really does
shift array ARGV and put the current filename into variable
ARGV. It also uses filehandle ARGV internally. You can
modify @ARGV before the first <> as long as you leave the
first filename at the beginning of the array. Line numbers
($.) continue as if the input was one big happy file. (But
see example under eof for how to reset line numbers on each
file.)
If you want to set @ARGV to your own list of files, go right
ahead. If you want to pass switches into your script, you
can put a loop on the front like this:
while ($_ = $ARGV[0], /^-/) {
shift;
last if /^--$/;
/^-D(.*)/ && ($debug = $1);
/^-v/ && $verbose++;
... # other switches
}
while (<>) {
... # code for each line
}
The <> symbol will return FALSE only once. If you call it
again after this it will assume you are processing another
@ARGV list, and if you haven't set @ARGV, will input from
STDIN.
If the string inside the angle brackets is a reference to a
scalar variable (e.g. <$foo>), then that variable contains
the name of the filehandle to input from.
If the string inside angle brackets is not a filehandle, it
is interpreted as a filename pattern to be globbed, and
either an array of filenames or the next filename in the
list is returned, depending on context. One level of $
interpretation is done first, but you can't say <$foo>
because that's an indirect filehandle as explained in the
previous paragraph. You could insert curly brackets to
force interpretation as a filename glob: <${foo}>. Example:
while (<*.c>) {
chmod 0644, $_;
}
is equivalent to
open(foo, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
while (<foo>) {
chop;
chmod 0644, $_;
}
In fact, it's currently implemented that way. (Which means
it will not work on filenames with spaces in them unless you
have /bin/csh on your machine.) Of course, the shortest way
to do the above is:
chmod 0644, <*.c>;
This page created by ng2html v1.05, the Norton guide to HTML conversion utility.
Written by Dave Pearson